NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A Landmark-Aware Visual Navigation Dataset for Map Representation Learning

Johnson, Faith; Data, Kristin; Cao, Bryan Bo; Jain, Shubham; Ashok, Ashwin (March 2025, Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction)

Free, publicly-accessible full text available March 4, 2026
A Lightweight Measure of Classification Difficulty from Application Dataset Characteristics

Cao, Bryan Bo; Sharma, Abhinav; O’Gorman, Lawrence; Coss, Michael; Jain, Shubham (November 2024, 27th International Conference on Pattern Recognition (ICPR))

Although accuracy and computation benchmarks are widely available to help choose among neural network models, these are usually trained on datasets with many classes, and do not give a good idea of performance for few (<10) classes. The conventional procedure to predict performance involves repeated training and testing on the different models and dataset variations. We propose an efficient cosine similarity-based classification difficulty measure S that is calculated from the number of classes and intra- and inter-class similarity metrics of the dataset. After a single stage of training and testing per model family, relative performance for different datasets and models of the same family can be predicted by comparing difficulty measures – without further training and testing. Our proposed method is verified by extensive experiments on 8 CNN and ViT models and 7 datasets. Results show that S is highly correlated to model accuracy with correlation coefficient r=0.796, outperforming the baseline Euclidean distance at r=0.66. We show how a practitioner can use this measure to help select an efficient model 6 to 29x faster than through repeated training and testing. We also describe using the measure for an industrial application in which options are identified to select a model 42% smaller than the baseline YOLOv5-nano model, and if class merging from 3 to 2 classes meets requirements, 85% smaller.
more » « less
Full Text Available
OVIDA: Orchestrator for Video Analytics on Disaggregated Architecture

https://doi.org/10.1109/SEC62691.2024.00019

Singh, Manavjeet; Rachuri, Sri Pramodh; Cao, Bryan Bo; Sharma, Abhinav; Bhumireddy, Venkata; Bronzino, Francesco; Das, Samir R; Gandhi, Anshul; Jain, Shubham (December 2024, IEEE)

Full Text Available
ViFiT: Reconstructing Vision Trajectories from IMU and Wi-Fi Fine Time Measurements

https://doi.org/10.1145/3615984.3616503

Cao, Bryan Bo; Alali, Abrar; Liu, Hansi; Meegan, Nicholas; Gruteser, Marco; Dana, Kristin; Ashok, Ashwin; Jain, Shubham (October 2023, ACM)

Tracking subjects in videos is one of the most widely used functions in camera-based IoT applications such as security surveillance, smart city traffic safety enhancement, vehicle to pedestrian communication and so on. In computer vision domain, tracking is usually achieved by first detecting subjects, then associating detected bounding boxes across video frames. Typically, frames are transmitted to a remote site for processing, incurring high latency and network costs. To address this, we propose ViFiT, a transformerbased model that reconstructs vision bounding box trajectories from phone data (IMU and Fine Time Measurements). It leverages a transformer’s ability of better modeling long-term time series data. ViFiT is evaluated on Vi-Fi Dataset, a large-scale multimodal dataset in 5 diverse real world scenes, including indoor and outdoor environments. Results demonstrate that ViFiT outperforms the state-of-the-art approach for cross-modal reconstruction in LSTM Encoder-Decoder architecture X-Translator and achieves a high frame reduction rate as 97.76% with IMU and Wi-Fi data.
more » « less
Full Text Available
ViTag: Online WiFi Fine Time Measurements Aided Vision-Motion Identity Association in Multi-person Environments

https://doi.org/10.1109/SECON55815.2022.9918171

Cao, Bryan Bo; Alali, Abrar; Liu, Hansi; Meegan, Nicholas; Gruteser, Marco; Dana, Kristin; Ashok, Ashwin; Jain, Shubham (September 2022, 2022 19th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON))

In this paper, we present ViTag to associate user identities across multimodal data, particularly those obtained from cameras and smartphones. ViTag associates a sequence of vision tracker generated bounding boxes with Inertial Measurement Unit (IMU) data and Wi-Fi Fine Time Measurements (FTM) from smartphones. We formulate the problem as association by sequence to sequence (seq2seq) translation. In this two-step process, our system first performs cross-modal translation using a multimodal LSTM encoder-decoder network (X-Translator) that translates one modality to another, e.g. reconstructing IMU and FTM readings purely from camera bounding boxes. Second, an association module finds identity matches between camera and phone domains, where the translated modality is then matched with the observed data from the same modality. In contrast to existing works, our proposed approach can associate identities in multi-person scenarios where all users may be performing the same activity. Extensive experiments in real-world indoor and outdoor environments demonstrate that online association on camera and phone data (IMU and FTM) achieves an average Identity Precision Accuracy (IDP) of 88.39% on a 1 to 3 seconds window, outperforming the state-of-the-art Vi-Fi (82.93%). Further study on modalities within the phone domain shows the FTM can improve association performance by 12.56% on average. Finally, results from our sensitivity experiments demonstrate the robustness of ViTag under different noise and environment variations.
more » « less
Full Text Available
Vi-Fi: Associating Moving Subjects across Vision and Wireless Sensors

https://doi.org/10.1109/IPSN54338.2022.00024

Liu, Hansi; Alali, Abrar; Ibrahim, Mohamed; Cao, Bryan Bo; Meegan, Nicholas; Li, Hongyu; Gruteser, Marco; Jain, Shubham; Dana, Kristin; Ashok, Ashwin; et al (May 2022, 2022 21st ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN))

Full Text Available

Search for: All records